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The Intrinsic Fraction of Broad Absorption Line Quasars 
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ABSTRACT 

We carefully reconsider the problem of classifying broad absorption line quasars 
(BALQSOs) and derive a new, unbiased estimate of the intrinsic BALQSO fraction 
from the SDSS DR3 QSO catalogue. We first show that the distribution of objects 
selected by the so-called "absorption index" (AI) is clearly bimodal in logAI, with 
only one mode corresponding to definite BALQSOs. The surprisingly high BALQSO 
fractions that have recently been inferred from Al-based samples are therefore likely to 
be overestimated. We then present two new approaches to the classification problem 
that are designed to be more robust than the AI, but also more complete than the 
traditional "balnicity index" (BI). Both approaches yield observed BALQSO fractions 
around 13.5%, while a conservative third approach suggests an upper limit of 18.3%. 
Finally, we discuss the selection biases that affect our observed BALQSO fraction. Af- 
ter correcting for these biases, we arrive at our final estimate of the intrinsic BALQSO 
fraction. This is Jbalqso = 0.17 ± 0.01 (stat) ± 0.03 (sys), with an upper limit of 
f balqso — 0.23. We conclude by pointing out that the bimodality of the log AI distri- 
bution may be evidence that the BAL-forming region has clearly delineated physical 
boundaries. 
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1 INTRODUCTION 

Broad absorption line quasars (BALQSOs) are a sub-class of 
active galactic nuclei (AGN) that exhibit strong , broad and 
blue- s hifted spectroscopic ab s orption features (IFoltz et al.l 
Il99d ; IWevmann et al.l Il99ll ; iReichard et~all l2003l h Most 
BALQSOs - the so-called HiBALs - only display absorption 
troughs in certain high-ionisation lines (e.g. NV A1240A, 
CIV A1549A, SilV A1397A), but some - the so-called 
LoBALS - also show absorption in some low-ionisation 
lines (most notably Mg ll A2800A). BALQSOs are pre- 
dominantly radio-quiet (IStocke et al.l 1 19921 ; iBecker et al.l 
l200ll ; IShankar, Dai, fc Sivakoflf 120081 ). and there are also 
subtle differences between their continuum and emission 
line properties and t hose of "normal" (non-BAL) QSOs 
IReichard et al.ll2003h . However, despite these differences, 
BALQSOs and non-BAL Q SOs appear to be dr awn from 
the same parent population (Reichard ct al. 200jl). 

The simplest and most promising interpretation of the 
QSO/BALQSO dichotomy is in terms of an orientation ef- 
fect. This fits in well with unified models, in which orienta- 
tion is the major factor determining the observational ap- 
pearance of AGN (e.g. Elvis 2000). It also makes sense phys- 
ically, since the absorption troughs in BALQSOs have long 
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been recognised as signatures of fast, large-scale outflows 
from the central engines. More specifically, blue-shifted ab- 
sorption is produced when the central continuum and/or 
emission line source is viewed through outflowing material 
that scatters photons out of the observer's line of sight. If 
the outflow subtends a solid angle < Q < 2ir, then both 
BALQSOs and non-BAL QSOs can be accounted for in this 
picture. 

The powerful outflows we observe in BALQSO are an 
important example of AGN feedback. Such feedback is the 
key ingredient in theoretical attempts to understand galaxy 
"downsizing" and may also be responsible for regulating the 
growth of supermassive black holes, quenching star forma- 
tion and setting up the Mbh — o and MsH — Mbuige relations 
(e.g. Silk & Rees 1998; King 2003; di Matteo, Springel & 
Hernquist 2005; Scannapieco, Silk & Bouwens 2005). How- 
ever, despite their fundamental importance, the geometry, 
kinematics and energetics of BALQSO outflows have re- 
mained highly uncertain. 

Perhaps the single most important quantity that can 
be determined empirically regarding BALQSOs is their inci- 
dence within the overall QSO population. More specifically, 
the BALQSO fraction (f balqso) is defined as the fraction 
of QSOs that display BALQSO absorption features. Its sig- 
nificance derives mainly from the fact that it allows a simple, 
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geometric interpretation: in the context of unified schemes, 
/balqso is the covering fraction of BALQSO outflows. 

Until recently, searches for BALQSOs in quasar 
surveys consistently reported observed B A LQSO 



fractions around 10%-15% dWevmann et al.l 
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surprise when ITrump et all (2006) reported a significantly 
higher BALQSO fraction of 26% from the spectroscopic 
QSO catalogue associated with the 3rd Data Release (DR3) 
of the Sloan Digital Sky Survey (SDSS; Schneider et al. 
2005). 

There can be little question that the QSO sample on 
which the Trump et al. study is based is superior to ear- 
lier QSO surveys. However, this is not the reason for their 
unusually high estimate of /balqso- Instead, Trump et al. 
argue that the "classic" definition of BALQSOs, based on 
the so-called "balnicity index" (hereafter, BI) is not appro- 
priate for BALQSO classification purposes. Instead, they 
prefer a different statistic, the so-called "absorption index" 
(hereafter, Al). The Al is designed to be less strict than 
the BI, with the result that a significantly higher fraction 
of QSOs are clas sified as broa d abso rption line (BAL) ob- 
jects. In essence, ITrump et ail (|2006l ) argue that BALs can 
be both weaker and much narrower than has previously been 
supposed. QSOs containing such features would naturally be 
excluded from any census involving the classic BI definition. 

If Trump et al. are correct, the covering fraction of 
BALQSO outflows must be much larger than has previ- 
ously been assumed. Indeed, it has been suggested that 
their observed BALQSO fraction of 26% implies an intrin- 
sic BALQSO fraction of 43%±2% once selection effects are 
taken into account (Dai, Shankar & Sivakoff 2008). This is 
about twice the best previous estimates (22%±4% [Hewett 
& Foltz 2003]; 15.9% ±1.4% [Reichard et al. 2003]). 

The main goals of the present paper are to take a fresh 
look at the metrics used for classifying BALQSOs and to 
derive a new, robust estimate of f balqso- It is worth em- 
phasizing from the outset that what we wish to accomplish 
is to identify a distinct sub-population of QSO of which clas- 
sic BALQSOs (with BI > km s" 1 ) are just the most ob- 
vious representatives. This is an important point, because - 
as effectively argued by Trump et al. (2006) - these classic 
BALQSOs may just be the tip of the iceberg. Thus the very 
term "broad absorption line quasar" could be a mis-nomer, 
since it is possible that the majority of objects belonging to 
this population could in principle exhibit only weak/narrow 
absorption features (or even no absorption at all). It could 
even turn out that a distinct BALQSOs sub-population does 
not exist: QSOs could simply exhibit a perfectly continuous 
and smooth distribution of absorption characteristics, with 
classic BALQSOs occupying the arbitrarily defined extreme 
tail of this distribution. As we shall see, there is, in fact, 
evidence that BALQSOs do form a distinct sub-population. 
With this in mind, we will use the term BALQSO through- 
out this paper to denote members of this sub-population, re- 
gardless of whether they are identified as such by any given 
metric. The goal, in fact, is to find ways of quantifying the 
size of this population in a way that is simultaneously robust 
(i.e. does not produce many false positives) and complete 
(i.e. does not miss many true members). 

In Section 2, we introduce and compare the widely used 



Al and BI metrics for identifying BALQSOs. In Section 3, 
we show that there is clear evidence for bimodality in the 
log Al distributions of Trump et al.'s BALQSO candidates, 
with "classic" BALQSOs (with positive BI) preferentially 
occupying one mode of the distribution. In Section 4, we 
present several concrete examples of problematic classifica- 
tions obtained with both standard metrics. In Section 5, we 
present two new approaches to the classification problem, 
which are designed to be more robust than the Al, but more 
complete than the BI. In Section 6, we correct the observed 
BALQSO fractions produced by our new approaches for se- 
lection effects and obtain our final estimate of the intrinsic 
BALQSO fraction. Finally, in Section 7, we discuss our re- 
sults and present our conclusions. 



2 HOW BROAD IS BROAD? METRICS FOR 
IDENTIFYING BROAD ABSORPTION LINE 
QUASARS 

The BI (|Wev mann et al.l Il99ll ) was the first quantitative 
metric used to identify BALQSOs within QSO surveys. Un- 
til the introduction of the Al (see below), the BI remained 
the standard way to classify objects as BALQSOs. Given a 
continuum-normalised spectrum in the vicinity of a spectral 
line, the BI is defined numerically as 



BI = 



1 



0.9 



Cdv. 



(1) 



Here, the limits of the integral are in units of km s , and 
f(v) is the normalised flux as a function of velocity displace- 
ment from line centreQ The constant C — everywhere, 
unless the normalised flux has satisfied f c (v) < 0.9 contin- 
uously for at least 2000 km s _1 ; at this point it is switched 
to C = 1 until f(v) > 0.9 again. Based on this definition, 
objects are classified as BALQSOs if their BI > km s _1 . 

Physically, the idea behind the BI is to count as BALs 
only absorption troughs that are definitely real (hence the 
requirement that f(v) < 0.9), definitely broad (hence the 
demand that troughs must be broader than 2000 km s _1 
in order to count) and significantly blue-shifted (hence the 
lower limit of 3000 km s _1 on the integral). The main at- 
traction of the BI as a classification tool is that it tends to 
produce very "clean" BALQSO samples. Indeed, it is hard 
to imagine a non-BAL QSO being assigned a positive BI 
unless its spectrum is either very noisy, suffers from a mis- 
placed continuum, or has been assigned an erroneous red- 
shift. However, the conservative nature of the BI also means 
that BALQSO samples based on it may be seriously incom- 
plete. There is certainly no compelling reason to think that 
somewhat weaker, narrower and/or less-blue-shifted BALs 

than recognised by the BI should not exist. 

This issue was already recognised by IWevmann et al.l 

|l99ll ) and provided the motivation for the introduction of 
the Al, initially by Hall et al. (2002, here purely as a means 



1 It is worth noting that in the original definition of the BI by 
Weymann et al. (1991), f(v) is normalised relative to the under- 
lying continuum, whereas other authors, including Trump et al. 
(2006), normalize relative to a best- fitting continuum plus emis- 
sion line template. 
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of identifying systems showing evidence of absorption). The 
definition of the AI ultimately adopted by Trump et al. 
(2006) is 

/■29000 

AI= / [1 -/(«)] C'dv, (2) 
Jo 

where f(v) is the normalized flux obtained after dividing 
the data by the best-fitting emission-line-plus-continuum 
QSO template. The constant C" = 1 in all regions where 
f(v) < 0.9 continuously for at least 1000 km s _1 and C = 
otherwise. Also, only regions containing at least one data 
point significantly below the underlying continuum are in- 
cluded in the calculation. This ensures that only true ab- 
sorption features are assigned positive AI. The two key dif- 
ferences that allow some objects with BI = km s" 1 to 
achieve AI > km s _1 are that (i) the AI includes regions 
within 3000 km s _1 of line centre (and also regions beyond 
25,000 km s _1 ), and (ii) the AI includes objects with much 
narrower absorption troughs than the BI. The remaining 
difference is associated with the absence of the factor 0.9 
in Equation [2] (compared to Equation [TJ . This change was 
made to the definition of the AI in order to allow a clear 
interpretation: the AI is the combined equivalent width of 
all absorption troughs in a given line that are located blue- 
wards of line centre, deeper than 0.9 of the continuum, and 
at least 1000 km s _1 wide. 

Note that both the AI and the BI can be sensitive 
to the type of spectrum from which they are measured. 
For example, an apparently broad absorption trough in a 
low-resolution spectrum may break up into multiple nar- 
row troughs when observed at higher resolution. Conversely, 
noise spikes may artificially break up a single trough, so that 
a true BAL could be assigned zero AI/BI in a noisy spec- 
trum. Throughout this paper, we will use Trump et al.'s 
(2006) AI/BI estimates for objects in the SDSS DR3 QSO 
catalogue. The health warning "as derived from its SDSS 
spectrum" should thus implicitly be added to the AI/BI es- 
timates we use for each QSO. 

It is obvious that if BALQSOs are classified on the ba- 
sis of the less restrictive AI, the resulting BALQSO frac- 
tion will be higher than if the BI were used. However, it 
is not obvious a priori that objects selected solely on the 
basis of having AI > km s -1 (i.e. including those with 
BI = km s -1 ) constitute a single population. The problem 
is that a wide variety of non-BAL absorption features are 
commonly seen in QSOs and other AGN. These typically 
narrower features can be due to absorption at an intermedi- 
ate redshift along the line of sight to the QSO, absorption 
within the host galaxy, or intrinsic absorption close to the 
QSO (including the so-called mini-BALS and associated ab- 
sorption features) whose origin remains poorly understood 
and could conceivably be linked to the broad absorption 
lines. It is therefore extremely difficult to say if any par- 
ticular QSO containing an "intermediate" width absorption 
trough (1000 km s" 1 < Av < 3000 km s" 1 ) should be classi- 
fied as a BALQSO or not. Roughly speaking, the BI metric 
does not consider any such objects to be genuine BALQSOs, 
whereas the AI metric labels all such objects as BALQSOs. 
In the following section, we will present statistical evidence 
that the AI metric, in particular, is far too permissive in this 
respect. 



3 THE BIMODAL LOG(AI) DISTRIBUTION 
OF AI-SELECTED QSOS 

Using the definitions above, Trump et al. (2006) calculated 
AIs and Bis for all 11,611 QSOs in the SDSS DR3 sam- 
ple. In Figure [T] we show as a black histogram the logAI 
distribution of the 3182 QSOs with AI > km s" 1 and 
in the redshift interval 1.90 < z < 4.36 (so as to contain 
CIV). This distribution is clearly bimodal, with one peak 
near 500 km s^ 1 and another around 3000 km s" 1 . 

In order to confirm and quant ify the bimodality, we 
have a pplied the KMM algorithm of lAshman. Bird, fc Zepj 
(1994). This effectively compares the quality of a single 
Gaussian fit to a distribution to that of a double Gaussian 
one. The probability that the overall logAI distribution is 
unimodal turns out to be negligible: the KMM likelihood 
test ratio statistic (essentially a % 2 ) 1S 590 for 4 degrees of 
freedom. This is vastly in excess of the value of about 4 one 
would expect for a unimodal distribution. 

The decomposition suggested by KMM is shown in the 
top panel of FigureQ] While there is no a priori reason to ex- 
pect the logAI distribution to be intrinsically Gaussian (or 
double Gaussian) , the two normal components provide quite 
a reasonable description of the distribution. More specif- 
ically, KMM suggests that the low-AI group contributes 
49.9% of the total AI > km s _1 population and is centered 
on AI ~ 500 km s _1 with a ~ 0.2 dex; the high-AI group 
contributes 50.1% and is centered on AI ~ 3000 km s~ with 
a ~ 0.3 dex. 

In the middle panel of Figure [T] we also show the 
AI distributions of all objects with BI > km s _1 (red 
histogram) and of all quasars with BI = km s _1 but 
AI > km s _1 (blue histogram). This shows that the two 
modes exhibited by the Al-selected quasar population cor- 
respond fairly closely to "classic" BALQSOs (high-AI mode; 
BI > km s _1 ), on the one hand, and newly added objects 
(low-AI mode; BI = km s _1 ), on the other. The BI metric 
classifies 41.2% of the AI > km s" 1 objects as BALQSOs. 
In general, the match of the KMM-suggested groups to the 
BI = km s _1 and BI > km s _1 groups is good, except 
near the overlap region. The KMM decomposition suggests 
that the Bl-selected sample may be seriously incomplete in 
this regime. 

It should be acknowledged at this point that "bimodal- 
ity" turns out to be a surprisingly slippery concept on closer 
examination. In particular, the number of modes in a distri- 
bution over a certain variable is not always invariant under 
simple transformations of that variable. This explains why 
the bimodality in the logAI distribution was not noticed 
by Trump et al. (2006), who only inspected the (linear) AI 
distribution. As it turns out, that distribution is, in fact, 
unimodal. 

So does the bimodal log AI distribution actually provide 
evidence for two distinct QSO sub-populations? It does, be- 
cause not every unimodal distribution can be transformed 
into a bimodal one via a logarithmic transformation. Thus 
while the concept of bimodality should perhaps be replaced 
by that of "bimodalizibility" ( Wyszomirski 1992), it remains 
true that distinct sub-populations are the most obvious way 
of producing such bimodalizable distributions. Indeed, in 
our case, the evidence for two distinct QSO sub-populations 
can be seen even in the linear AI distribution. In Figure [2j 
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Figure 1. The logAI distribution of objects with AI > km s~ x 
(black histograms in all panels). Note the obvious bimodality of 
this distribution. Top panel: The decomposition of the distribu- 
tion suggested by the KMM algorithm. Middle panel: The de- 
composition resulting if the classic balnicity index (BI) is used to 
classify BALQSOs. Bottom panel: The decomposition resulting if 
a hybrid method involving learning vector quantization (LVQ) is 
used to classify BALQSOs (see Section 5). 

we compare the AI and logAI distribution directly. Even 
though the AI distribution is unimodal, it is obvious that 
the characteristic scale on which the distribution drops off 
changes abruptly at around AI ~ 1700 km s _1 , which co- 
incides with the dip between the two modes of the logAI 
distribution. We thus believe that the evidence for two dis- 
tinct sub-populations in the overall distribution is robust 
Inspection of the linear AI distribution suggests that, 
beyond its mode at around AI ~ 400 km s -1 (which cor- 
responds to the low-AI mode of the logAI distribution), 
the drop-off in each of the two distinct regimes is roughly 
exponential. As shown in Figure [2J we have therefore fit 
a double exponential model to this distribution for AI > 
500 km s _1 . Note that, as expected, this unimodal two- 



2 Since our paper was accepted, Nestor, Hamann & Rodriguez 
Hidalgo (2008) have also found an excess of strong absorbers in 
a study focused mainly on relatively narrow C sc iv absorption 
line systems (see their Figure 9). We suspect this excess may be 
directly associated with the high-AI, BALQSO mode of the log AI 
distribution. 



population model for the Al-distribution produces a bimodal 
distribution in log AI (Figure [2] top right panel) . Since this 
double-exponential model imposes no low-AI cut-off at all 
on the sub-population that dominates at high-AIs, it allows 
us to set a useful upper limit on the size of this population 
(see Sections 15.31 and I6.4|l . 



4 BEYOND STATISTICS: REPRESENTATIVE 
SPECTRA ACROSS THE AI/BI 
PARAMETER SPACE 

It is important to relate the statistical results of the previous 
section to specific spectral properties of individual QSOs. 
What type of objects do we select when we apply AI and/or 
BI metrics, and what type of spectra correspond to different 
combinations of AI and BI? 

The top row in Figure [3] shows the C IV line profiles of 
four QSOs belonging to low-AI mode in Figure[T](AI ~ 500; 
BI = km s _1 ). To our eyes, none of these QSOs appear to 
be genuine BALs0 We have visually inspected the majority 
of similar objects and find that the same is true for most 
of them. Objects with AI > km s _1 and BI = km s _1 
comprise about half of the population with AI > km s _1 , 
so this population is certainly not representative of "classic" 
BALQSOs. 

The second row from the top in Figure [3] shows objects 
selected from the high-AI mode in Figure [1] (AI ~ 3000; 
BI > km s _1 ). As expected, all exhibit the strong and 
broad absorption features that are characteristic of "classic" 
BALQSOs. 

The third row from the top in Figure[3]shows a selection 
of BI = km s _1 objects from the overlap region in Figure[T] 
(AI ~ 1000 - 3000 km s" 1 ). It is immediately clear that 
these intermediate- width absorption line objects can indeed 
be difficult to classify with confidence. However, we have 
also included in this row two objects (SDSS J1730 and SDSS 
J1042) that appear to be genuine BALQSOs that have been 
missed by the BI. 

The bottom row in Figure [3] shows more objects from 
the overlap region in Figure [T] (AI ~ 1000 - 3000 km s" 1 ), 
but now with BI > km s _1 . While these are clearly harder 
to classify than those in the high-AI mode, we think the 
BI has done a good job of assigning these objects to the 
BALQSO class. 

In our view, the results of the previous and present sec- 
tions imply that, although not perfect, the BI is a better 
metric for BALQSO identification than the AI. The major- 
ity of objects with positive BI are clearly genuine BALQ- 
SOs, but the same cannot be said with any confidence of 
objects classified solely on the basis of positive AI. The AI is 
certainly very good at finding absorbing systems, including 
essentially all BALQSOs. However, the spectroscopic prop- 
erties of objects with BI = km s _1 but AI > km s _1 
, as well as the bimodality in the AI > km s _1 popula- 
tion, suggest that purely Al-selected BALQSO samples will 
be strongly contaminated by objects with properties that 



3 It should be acknowledged, however, that our organic neural 
networks have also been trained primarily on Bl-selected BALQ- 
SOs. 
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AI (km/s) AI (km/s) 

Figure 2. Comparison of the AI distribution (left panels) and the logAI distribution (right panels) of objects with AI > km s _1 . 
Note that only the logAI distribution is bimodal, but that the AI distribution exhibits two distinct characteristic scale lengths in the 
low-AI and high-AI regimes. Thus both types of distribution provide evidence of two distinct sub-populations, each of which dominates 
in one of these regimes. In the top panels, we also show a maximum likelihood fit to the Al-distribution above AI = 500 km s — 1 with a 
double exponential model. In the bottom panels, we again show the KMM-decomposition from Figure [l] which corresponds to a double 
Gaussian in log AI (and a double log- normal distribution in AI) . 



are clearly distinct from those of "classic" BALQSOs (c.f. 
Ganguly et al. 2007). 

The fact remains, however, that Bl-selected BALQSO 
samples may themselves be seriously incomplete. In Sec- 
tion 3, we showed that the BI criterion selects 41.2% of 
QSOs with AI > km s" 1 as BALQSOs, whereas the KMM 
decomposition of the logAI distribution implies a signifi- 
cantly higher percentage of 50.1%. Similarly, we have now 
found specific examples of QSOs with BI = km s _1 that, 
visually, would seem to be excellent BALQSO candidates 
(e.g. SDSS J1730 and SDSS J1042 in Figure [3]). None of 
this should come as a surprise. As discussed in Section 2, 
there is simply no physical reason to expect that all genuine 
BALQSOs should have C IV absorption troughs that extend 
for at least 2000 km s _1 beyond the arbitrary 3000 km s _1 
starting point adopted in the definition of the BI. 

We conclude that BALQSO fractions derived from AI- 
selected samples are strong overestimates, whereas those de- 
rived from Bl-selected samples are at least mild underesti- 
mates. In the following section, we will use two new meth- 
ods to determine observed BALQSO fractions that are more 
robust than Al-based estimates and more complete than BI- 
based ones. 



5 THE OBSERVED BALQSO FRACTION IN 
SDSS DR3 

The fundamental problem with simple metrics such as the 
AI and the BI is their rigidity. For example, the BI will firmly 
reject an object with an absorption trough whose width is 
marginally less than 2000 km s" 1 , even if this trough looks 
virtually indistinguishable from many objects that the BI 
does classify as BALQSOs. One way to avoid this incom- 
pleteness is to relax the classification criteria, but this incurs 
the danger of producing many false positives. This is what 
appears to have happened in the switch from the BI to the 
AI. 

In order to overcome these problems, we have used two 
new approaches to estimate the observed BALQSO frac- 
tion in SDSS DR3. The first approach is based directly 
on the KMM-decomposition of the AI distribution in Fig- 
ure [TJ whereas the second approach is a hybrid method that 
employs a Bl-trained neural network algorithm - learning- 
vector quantization (LVQ) - to flag potentially mis-classified 
objects for visual inspection. We also use a third approach - 
a decomposition based on the double-exponential model for 
the AI distribution described in Section [3] - to estimate an 
upper limit on the observed BALQSO fraction. The feature 
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Figure 3. Representative spectra for various parts of the AI/BI parameter space. Each row corresponds to objects from a distinct region 
of this parameter space. Top row: Objects belonging to the low-AI mode in Figure 1 (BI = km s" 1 ; AI ~ 500). Second Row (From 
Top): Objects belonging to the high-AI mode in Figure 1 (BI > km s — 1 ; AI ~ 5000). Third Row (From Top): BI = km s _1 objects 
belonging to the overlap region in Figure 1 (AI ~ 2000 km s _1 ). Bottom Row: BI > km s — 1 objects belonging to the overlap region in 
Figure 1 (AI ~ 2000 km s" 1 ). 



common to all three approaches is that they are fundamen- 
tally more flexible than the AI or BI metrics. 

5.1 KMM-based decomposition 

The KMM-based approach is straightforward. As discussed 
in Section 3 and shown in Figure [T] (top panel) , the log AI 
distribution of QSOs with AI > km s _1 can be decom- 
posed fairly cleanly into two Gaussian components. This de- 
composition can be used immediatley to assign a probability 
to each object of belonging to one or the other group. The 
KMM algorithm we have used provides these probabilities 
automatically for each object. A raw, observed BALQSO 
fraction can therefore be estimated from this decomposition 
as 

N QSO 

f BALQSO = -TZ > Pi, BALQSO (3) 

AQSO ^— ' 
i— 1 

where Pi,balqso is the KMM-assigned probability that 
quasar i is a BALQSO (i.e. that it belongs to the high-AI 
mode of the distribution). 

The main weakness of this method is that it assumes 
the KMM-decomposition to be correct. This is almost cer- 
tainly not true in detail. Just as there is no reason to think 
that every BALQSO trough is at least 2000 km s^" 1 wide, 
there is no a priori reason to assume that the log AI distri- 
bution of BALQSOs is exactly Gaussian. However, Figure [1] 
suggests that a Gaussian distribution may be quite a good 



approximation to the true log AI distribution of BALQSOs. 
The great strength of the decomposition approach is that 
it provides a very complete statistical census of BALQSOs 
(subject to its underlying assumption). 

Applying this method to the full DR3 QSO sample 
in the redshift range 1.90 < z < 4.36 yields an observed 
BALQSO fraction of 13.7% ±0.3% (where the error only ac- 
counts for Poisson statistics). Note that this observed global 
fraction is still subject to selection biases. These are dealt 
with in Section 6. 

5.2 A hybrid method using learning vector 
quantization 

In our second approach, we use a hybrid method to classify 
BALQSOs. Starting with a Bl-based classification, we use a 
machine learning algorithm called Learning Vector Quan- 
tization (LVQ) to identify objects that might have been 
misclassified by the BI. All such objects are then inspected 
and classified visually. We will refer to this hybrid method 
as "LVQ-based" throughout this paper. However, it should 
be kept in mind that we do not use LVQ as a stand-alone 
BALQSO classifier, but as part of a process involving the 

BI, LVQ and visual inspection. 

LVQ was originally devised bv lKohonenl l|200ll) and uses 
a neural network to assign new input data to pre-defined 
classes. LVQ is a particulary simple supervised neural net- 
work, in which each neuron is simply tagged as belonging 
to a particular class. The basic idea behind LVQ is that, 
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through training, each neuron should come to represent a 
characteristic type of object within its class. New inputs 
can then be assigned to classes on the basis of maximum 
similarity to a particular neuron. 

In our case, the relevant classes are BALs vs non-BALs, 
and the data are continuum-normalised QSO spectra be- 
tween AA1400-1700 A (spanning the C IV line). Normal- 
ization is performed exactly as in iNorth. Knigge. fc Goadl 
(2006), and the measure of similarity we use when compar- 
ing spectra and neurons is the Euclidean distance between 
them (i.e. the mean rms residual). Note that we allow for 
redshift errors in all comparisons. 

We use 800 QSOs as our training set, with 400 
BI > 0km s _1 objects initially representing the BALQ- 
SOs and 400 BI = km s _1 objects initially representing 
the non-BAL QSOs. The network is then iteratively trained 
to classify the objects in the training set in line with their 
input classifications. Even though these input classifications 
are purely Bl-based, the converged network already man- 
ages to classify some BI = km s -1 objects in the training 
set as likely BALQSOs (and some BI > km s _1 objects as 
likely non-BAL QSOs). This is possible because the network 
classifications are based on spectral similarity, not on the BI 
itself. In order to re-enforce this feature of the network, we 
inspect all of the "misclassified" objects in the training set 
visually and retag them if appropriate. We then carry out a 
full second training run, where the training set now includes 
BI = km s _1 objects explicitly tagged as BALQSOs (and 
vice versa) . The converged network produced by this second 
training run is our final LVQ machine classifier. All 11,611 
DR3 QSOs in the relevant redshift range are passed to this 
network, resulting in an LVQ classification for each of them. 

As already noted above, we do not use LVQ as a stand- 
alone BALQSO classifier, but as a tool to flag borderline 
cases where the LVQ and BI classifications disagree. All such 
cases are then inspected and classified visually, and the vi- 
sual classification is adopted as final. In practice LVQ classi- 
fied 524 BI = km s" 1 objects as BALQSOs, of which 334 
were also classified as BALQSOs visually. Thus LVQ was 
quite good at identifying BI = km s" 1 BALQSOs. How- 
ever, LVQ also classified 383 objects with BI > km s _1 
as non-BAL QSOs, and only 95 of these were also classified 
as non-BAL QSOs visually. This underlines the importance 
of the visual inspection step and justifies our reluctance to 
use LVQ as a stand-alone BALQSO classifier. As explained 
above, whenever we refer to "LVQ-based" quantities below, 
we will always mean quantities calculated on the basis of 
the full hybrid method, which uses the BI, LVQ and visual 
inspection. 

Our LVQ-based method classifies 1,557 of the 11,611 
QSOs in our DR3 parent sample as BALQSOs. The 
LVQ-based decomposition of AI > objects into BALQ- 
SOs and non-BAL QSOs is shown in the bottom panel of 
Figure Q] The LVQ-based observed BALQSO fraction is 
13.4% ± 0.3%, which is consistent with the KMM-based 
estimate. 



4 A catalogue providing the KMM-assigned probabilities and 
LVQ-based classifications is available in electronic form from 
|http://www. astro. soton.ac.uk~simo| . 



5.3 Double exponential decomposition 

Our third and final approach is based on the double expo- 
nential model for the (linear) AI distribution described in 
Section [3] and shown in Figure [2] If we associate the ex- 
ponential that dominates at high-AIs with BALQSOs, we 
can use this model to estimate BALQSO fractions in the 
same way as for the KMM-based decomposition. It is worth 
emphasizing that this model assumes that there is no low- 
AI cut-off at all in the true BALQSO population - even 
QSOs with no absorption at all can be "BALQSOs" in this 
case. The observed turn-over in the Al-distribution below 
AI ~ 400 km s _1 must then be due to incompleteness. This 
is not entirely unreasonable, since the definition of the AI 
imposes a lower limit of 100 km s _1 and only counts absorp- 
tion troughs that dip below true continuum. 

While this is quite an extreme model in our opinion, 
it is impossible to rule out with the present data. We have 
therefore also estimated an "observed" BALQSO fraction 
on the basis of this double-exponential decomposition. This 
effectively provides an upper limit on the BALQSO fraction. 
Based on the model shown in Figure we find that the ex- 
ponential dominating at high-AI values corresponds to an 
observed BALQSO fraction of 18.3% ± 0.4% (where the er- 
rors are again purely based on Poisson statistics) . Note that 
this estimate includes QSOs with estimated AI = km s _1 
that are not part of Trump et al.'s (2006) Al-based BALQSO 
catalogue. If we (somewhat arbitrarily) exclude such objects, 
the observed BALQSO fraction is 17.2% ± 0.4%. As ex- 
plained above, we consider these estimates to be upper lim- 
its on the observed BALQSO fraction. It is therefore worth 
noting that even the estimate which includes AI = km s^ 1 
objects lies substantially below the 26% BALQSO fraction 
suggested by Trump et al. (2006) based on the number of 
QSOs with AI > km s _1 . 



6 THE INTRINSIC BALQSO FRACTION 

The observed BALQSO fractions we have derived in the pre- 
vious section do not provide a fair measure of the intrinsic 
incidence of BALs within the QSO population. This is be- 
cause the SDSS QSO sample suffers from a variety of selec- 
tion effects that affect BALQSOs differently from non-BAL 
QSOs. The impact of the resulting biases can be seen in 
Figure [4] which shows that the observed BALQSO fractions 
depend strongly on redshift. As we shall see, this redshift 
dependence is mainly due to selection effects (c.f. Reichard 
et al. 2003). 

In the following subsections, we first construct a more 
homogenously selected QSO sample and then correct the 
observed BALQSO fraction derived from it for colour-, 
magnitude- and redshift- dependent biases. Finally, we put 
all of these results together to produce an unbiased estimate 
of the intrinsic BALQSO fraction. 

6.1 A homogenous QSO parent sample 

The SDSS DR3 QSO catalogue contains objects selected 
via a variety of selection criteria (Schneider et al. 2005). 
We therefore create a more homogenous QSO sample by re- 
taining only those objects that were (or would have been) 
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Figure 4. The redshift distribution of the observed BALQSO 
fraction. Red points correspond to the fractions determined with 
the KMM-based approach (see Section |5.1> ; black points corre- 
spond to the fractions obtained from the LVQ-based approach 
(see Section 15.21 1. No correction for selection effects has been ap- 
plied to these fractions. Error bars on the KMM fractions have 
been suppressed for clarity, but are always similar to the LVQ 



selected for spectroscopic follow-up by the final QSO target- 
ing algorithm (as described by Richards et al. 2002). This 
leaves us with 7,487 QSOs (out of 11,611) in our redshift 
range. The observed BALQSO fractions in this homogenous 
sample are 14.0% ± 0.4% (KMM-based) or 14.1% ± 0.4% 
(LVQ-based), but still exhibit a strong redshift dependence 
due to selection effects. 

The SDSS QSO selection algorithm actually consists of 
two parallel strands, one aimed at creating a "main" QSO 
sample, the other aimed specifically at finding high redshift 
QSOs. □ The two strands use different limiting i'-magnitudes 
and colour selection criteria, which must be taken into ac- 
count when dealing with the resulting selection biases. Of 
the 7,487 objects in our homogenous sample, 5134 would 
have been selected by the main sample selection criteria and 
4145 by the high-redshift QSO selection criteria (1792 QSOs 
satisfied both sets of criteria). 
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Figure 5. LVQ-based BALQSO and non-BAL QSO composites 
and the corresponding K-correction in the i'-band. Top Panel: 
The blue line shows the BALQSO composite after dereddening 
and scaling it to optimally match the non-BAL QSO composite 
(black line). The red line shows the original BALQSO spectrum, 
but scaled so that its normalization relative to the non-BAL spec- 
trum is in line with the difference in reddening/extinction between 
them. Bottom Panel: Each line shows the redshift-dependent i'- 
band magnitude difference between the correctly scaled BALQSO 
and non-BAL QSO composites, for a particular assumed value 
of the differential reddening/extinction between them. The solid 
dark line corresponds to our preferred reddening estimate; the 
dotted thin lines as based on our estimate of the uncertainty on 
this. 



6.2 Limiting-magnitude bias 

There are two reasons why a magnitude cut may affect 
BALQSOs differently from non-BAL QSOs. First, BAL 
troughs may be redshifted into the bandpass where the mag- 
nitude cut is applied, causing BALQSOs to appear fainter 
than otherwise identical non-BAL QSOs. Second, the con- 
tinuum spectral energy distributions (SEDs) of BALQSOs 
are reddened with respect to those of non-BAL QSOs. As 



5 Strictly speaking, there is also a third strand, since objects 
with FIRST radio counterparts are also preferentially targetted. 
However, in order to correct for optical colour- and magnitude- 
dcpcndcnt biases, we need a sample with rigorous optical selection 
criteria. We therefore do not include QSOs targetted solely on the 
basis of radio emission in our homogenous QSO sample. 



already shown by Reichard et al. (2003) , the form of this red- 
dening is consistent with extinction by SMC-like dust. This 
again means that BALQSOs will be fainter than otherwise 
similar non-BAL QSOs. The consequence of these effects 
is that any magnitude cut will disproportionately remove 
BALQSOs from the sample. 

In order to correct for this, we first construct BALQSO 
and non-BAL QSO composites and estimate the difference 
in reddening/extinction between them. Note that we use 
geometric mean composites, which ensures that the spectral 
index and reddening of each composite corresponds to the 
arithmetic mean of the spectral indices and reddening values 

6 The composites used in this section were constructed using the 
LVQ-based samples; the equivalent KMM-based composites are 
virtually identical. 
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of the spectra used to construct the composite (Reichard et 
al. 2003). The absolute flux densities of the composites are 
arbitrary, however, since all individual spectra are scaled to 
an average value of unity in a reference wavelength interval 
near 1700 A. 

As shown in Figure [S] (top panel), dereddening the 
BALQSO composite by E(B - V) = 0.03 ± 0.01 and rescal- 
ing produces a good match to the non-BAL QSO compos- 
ite longward of ~ 1600 A (i.e. away from any major BAL 
troughs). This is consistent with the findings of Reichard et 
al. (2003). We therefore scale the original BALQSO compos- 
ite so that its normalization relative to the non-BAL QSO 
composite is in line with our estimate of the difference in ex- 
tinction between them (Figure [5] red line). Finally, we carry 
out synthetic photometry to determine the i'-band mag- 
nitude difference between BALQSOs and non-BAL QSOs 
as a function of redshift. This "differential K-correction" is 
shown in the bottom panel of Figure [5] The sharp upturn 
around z ~ 3.5 corresponds to the first major BAL trough 
(C iv 1550 A) being red-shifted into the i'-band. At the 
lower redshifts we will mostly be interested in below, the 
K-correction is only due to extinction. 

We can now estimate a corrected BALQSO fraction in 
any redshift bin as 

f A SO ~ N BALQSO 

Nbalqso +N non _BAL QSo(i' < [i' lim - Ai'(z)])' 

where i' lim is the limiting magnitude imposed by the selec- 
tion algorithm {i' lim = 19.1 for any QSO selected only via 
the main sample strand; i' Um — 20.2 for QSOs identified 
by the high-z colour selection). The quantity Ai'(z) > 
is the differential K-correction. For sufficiently narrow red- 
shift bins, this could be approximated as constant within 
each bin, but it is just as easy (and more precise) to calcu- 
late the K-correction independently for each non-BAL QSO 
according to its exact redshift. Note that Nbalqso and 
N non ^BAL qso become sums over probabilities when cal- 
culating f balqso from the probabilistically-defined KMM 
sample (c.f. Equation (3}. 

Our correction for limiting-magnitude bias is similar to 
that applied by Hewett & Foltz (2003). It should produce 
reasonable results, provided that the intrinsic BALQSO and 
non-BAL QSO luminosity functions do not exhibit sharp 
breaks near the limiting absolute magnitude in any given 
redshift bin. One limitation of our approach is that it does 
not account for variations in K-correction associated with 
variations in BAL strength. However, BAL troughs only af- 
fect the K-correction beyond z ^ 3.5, and in this regime we 
also do not have a reliable correction for colour-selection bias 
(see Section [6.31 and Figure [BJ. We therefore simply restrict 
our attention to lower redshifts, z ;$ 3.5. 

6.3 Colour-selection bias 

Both the main and high-z strands of the SDSS targeting 
algorithm select QSO candidates on the basis of their op- 
tical photometric colours. In both strands, QSO candidates 
are identified as outliers from the locus defined by normal 
stars in the 5-dimensional SDSS colour space (u'g'r'i'z'). The 
completeness of the resulting QSO samples is a function of 
redshift, since even a fixed intrinsic SED produces different 
observed colours when placed at different redshifts. 



2 - 
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Figure 6. The redshift-dependent correction factor for colour- 
selection bias. This was derived by Reichard et al. (2003; see their 
Fig. 10), and needs to be applied to the observed BALQSO frac- 
tion. 



None of this would matter for the derivation of the in- 
trinsic BALQSO fraction if the SEDs of BALQSOs and non- 
BAL QSOs were identical (at least in a statistical sense). Un- 
fortunately, they are not. First, whenever a deep BAL trough 
is shifted into a particular waveband, all colours involving 
that band are changed. Second, as discussed in Section [6.21 
and shown in Figure [5] the continuum SEDs of BALQSOs 
are reddened compared to those of non-BAL QSOs. The 
upshot of these colour differences is that the efficiency of 
the SDSS QSO selection algorithm(s) is not the same for 
BALQSOs as for non-BAL QSOs. 

Fortunately, Reichard et al. (2003) have already de- 
rived a redshift-dependent correction factor that can be ap- 
plied to the observed BALQSO fraction to account for this 
colour-selection bias. In order to determine this correction, 
Reichard et al. created large sets of simulated QSO and 
BALQSO colours and passed both through the SDSS QSO 
selection algorithm. The resulting correction factor is shown 
as a function of redshift in Figure [BJ 

Three key points should be noted regarding this cor- 
rection for colour-selection bias (see Reichard et al. 2003 
for a full discussion). First, it is only approximate. One 
important limitation is that all of the simulated BALQSO 
colours used to derive the correction factor were based on 
the colour differences between a HiBALQSO composite and 
an average QSO composite. Thus variations in BALQSOs 
colours arising from the range of observed BAL strengths 
are not properly accounted for. It should also be kept in 
mind that the HiBALQSO composite used by Reichard et 
al. was based on a different definition of what constitutes 
a BALQSO than the LVQ- or KMM-based definition used 
here. Second, the correction factor is significantly greater 
than unity near z ~ 2.5, but much less than unity near 
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Figure 7. The redshift distribution of the intrinsic BALQSO 
fraction (after correcting for selection effects). Red points corre- 
spond to the fractions determined with the KMM-based approach 
(see Section 15.10 : black points correspond to the fractions ob- 
tained from the LVQ-based approach (see Section 15.21 . Note that 
the redshift dependence of the intrinsic fractions is markedly re- 
duced compared to that of the observed fractions (c.f. Figure (3) 
and that the residual undulations are strongly correlated with the 
correction factor for colour-selection bias (Figure[6). The horizon- 
tal lines near the bottom of the plot marks redshift ranges where 
colour-selection bias is negligible. Our final estimate of the intrin- 
sic intrinsic BALQSO fraction is derived from only those regions 
and is shown the solid horizontal line. The dark (light) shaded 
regions correspond to our estimate of the statistical (systematic) 
uncertainty on this number. 

z ~ 2.8. Thus the colour-selection bias causes BALQSOs to 
be under-represented around z ~ 2.5, but over-represented 
around z ~ 2.8 (this explains the spike at this redshift in 
Figure 2| . Third, the correction factor is close to unity for 
z J$ 2.2 and 3.0 ^ z ;$ 3.4. These redshift ranges are thus op- 
timal for estimating the intrinsic BALQSO fraction. 

6.4 Putting it all together 

Let us summarize all of the steps we have taken so far. 
First, we assigned a BALQSO or non-BAL QSO classifi- 
cation to every object in the redshift range 1.90 < z < 4.36 
in the SDSS DR3 QSO catalog^ Next, we created a more 
homogenous sample by removing all objects that were not 
selected by the SDSS QSO targetting algorithm. We then 
accounted for limiting-magnitude bias by removing every 
non-BAL QSOs that is fainter than the effective (dered- 
dened) magnitude limit for a BALQSO at the same redshift. 
Finally, we applied a redshift-dependent correction for the 
colour-selection bias imposed by the QSO targetting algo- 
rithm. 

The final product of all these steps - and the main re- 
sult of this paper - is the intrinsic BALQSO fraction plot- 
ted in Figure [7] Two key points are worth noting from this 
straightaway. First, the agreement between the KMM- and 
LVQ-based BALQSO fractions is extremely good across the 

7 In the case of KMM, we assign a BALQSO probability. 



whole redshift range. This adds to our confidence that we 
are measuring the intrinsic abundance of a consistent class 
of objects. Second, the intrinsic BALQSO fractions show 
much less variability with redshift than the observed frac- 
tions (c.f. Figure [4} , although some residual "wiggles" re- 
main. Comparing Figures|5]and[7]immediately suggests that 
these wiggles are due to an imperfect correction for colour- 
selection bias. More specifically, the redshift dependence of 
the colour-correction factor is positively correlated with that 
of the intrinsic BALQSO fraction, so the correction derived 
by Reichard et al. (2003) appears to be somewhat too strong 
at most redshifts. We therefore do not believe that there is 
evidence for genuine evolution in /balqso with redshift. 

As suggested in Section [6]3] we derive our final estimate 
of the intrinsic BALQSO fraction from the restricted redshift 
ranges 1.9 < z < 2.2 and 3.0 < z < 3.4. These are largely 
free of colour-selection bias and produce consistent results. 
Our best estimate of the intrinsic BALQSO fraction from 
these regions is /balqso = 0.17 ± 0.01 (stat) ± 0.03 (sys). 
The statistical error here is just due to number statistics. 
The systematic error accounts for the uncertainty on the 
differential K-correction and for alternative choices in con- 
structing the parent sample and selecting optimal redshift 
ranges. 

We finally also estimate an upper limit on the intrin- 
sic BALQSO fraction, based on the double exponential de- 
composition described in Section 15.31 The upper limit on 
the observed BALQSO fraction suggested by this decom- 
position was 18.3%, approximately 1.35 times larger than 
our preferred estimates of 13.7% (KMM) and 13.4% (LVQ). 
Since there is no evidence for a redshift dependence, we es- 
timate an upper limit on the intrinsic fraction by applying 
the same factor to our best estimate of this fraction. The 
resulting upper limit is then f balqso — 0.23. 



7 DISCUSSION AND CONCLUSIONS 

Determining the "true" BALQSO fraction is a challenging 
task. A large part of the problem is the ambiguity one of- 
ten encounters when attempting to classify individual ab- 
sorption features as BALs or otherwise. The first goal of 
the present work has been to shed light on this classifica- 
tion problem. In this context, we have shown that when the 
recently introduced "absorption index" (AI) is used to clas- 
sify BALQSOs, the resulting logAI distribution is clearly 
bimodal. Both modes contain comparable numbers of ob- 
jects, but only the high-AI mode is clearly associated with 
genuine BALQSOs. Thus recent Al-based estimates of the 
BALQSO fraction - 26% (observed; Trump et al. 2006) or 
43% (intrinsic; Dai, Shankar & Sivakoff 2008) - are likely to 
be seriously overestimated. 

However, there are also good reasons to believe that 
the traditional "balnicity index" (BI) produces incomplete 
BALQSO samples. In order to make progress, we have there- 
fore used two complementary new approaches to derive ob- 
served BALQSO fractions. One is based on a statistical de- 
composition of the log AI distribution, the other is a hybrid 
method in which a Bl-trained neural network flags likely 
mis-identifications for visual inspection. Both approaches 
yield an observed BALQSO fraction around 13.5% for the 
SDSS DR3 QSO catalog (in the range 1.90 < z < 4.36). 
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This number should be more reliable than Al-based ones 
and more complete than purely Bl-based ones. We also esti- 
mate an upper limit on the observed fraction of 18.3%, based 
on a decomposition of the Al-distribution that allows even 
objects without any absorption to be classified as BALQ- 
SOs. 

This observed fraction is still subject to serious selec- 
tion effects. We have therefore explained in detail how the 
observed BALQSO fraction can be corrected for colour-, 
magnitude- and redshift- dependent selection biases. Along 
the way, we confirmed that BALQSOs have redder SEDs 
than non-BALs, consistent with extinction by SMC-like dust 
at a level of E(B - V) = 0.03 ± 0.01. 

After applying all corrections, there is no compelling ev- 
idence for redshift evolution in the intrinsic BALQSO frac- 
tion. Our final estimate of the global intrinsic BALQSO frac- 
tion is then /balqso = 0.17 ±0.01 (stat) ±0.03 (sys), with 
an upper limit of /balqso — 0.23. As expected, this is simi- 
lar to, but slightly higher than, the Bl-based estimates from 
the SDSS EDR (Reichard et al. 2003). It is also similar to re- 
cent Bl-based estimates (Hewett & Foltz 2003; Dai, Shankar 
& Sivakoff 2008) and consistent with the BALQSO fraction 
measured by Maddox et al. (2008) from a K-band selected 
QSO sample. 

In closing, we would like to comment on the relation- 
ship between BALQSOs and what might be called "absorp- 
tion line QSOs" (ALQSOs; this includes all objects display- 
ing some form of absorption, such as BALs, mini-BALs, 
associated absorption features, narrow absorption lines...). 
Based primarily on the bimodality of the logAI distribu- 
tion, we have argued throughout this paper that BALQ- 
SOs represent a phenemenologically distinct class amongst 
the ALQSOs. However, this does not imply that BALs and 
other absorption features must be produced in physically 
distinct line-forming regions. After all, orientation effects 
alone can dramatically alter the appearance of lines formed 
in non-spherical outflows from accretion disks (see, for ex- 
ample, Hamann, Korista & Morris [1993], Murray et al. 
[1995], or, in a different context, Knigge et al. [1995], Long 
& Knigge [2002]). Indeed, in the QSO unification scheme of 
Elvis (2000), both broad and narrow absorption lines are ex- 
plicitly assumed to be formed in the same disk wind. In our 
view, it is likely that many, if not most, of the absorption 
(and perhaps also emission) line signatures seen in AGN and 
QSOs are formed in such accretion disk winds. We therefore 
agree with Ganguly & Brotherton (2008) that a comprehen- 
sive look at a wide range of outflow tracers is required in 
order to develop a full empirical picture of these disk winds. 

The empirical distinctions between objects exhibiting 
different kinds of outflow tracers are important clues in this 
process. For example, if BALQSOs and other ALQSOs are 
literally "the same thing viewed from different angles", it 
could be highly relevant that they occupy distinct modes 
of the logAI distribution. For example, in the context of 
orientation-based unification schemes, a restricted Al-range 
for BALQSOs would probably imply that the BAL-forming 
region of the outflow has clearly delineated physical bound- 
aries. This would ensure that there is little room for overlap 
between sightlines looking into this part of the outflow (and 
seeing a BAL) and sightlines looking across it (and seeing 
only narrower absorption features). However, this conclu- 
sion cannot yet be considered robust, since different viable 



decompositions of the Al distribution can produce different 
Al-ranges for BALQSOs. 
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